Relational Distance-Based Clustering
نویسندگان
چکیده
Work on rst-order clustering has primarily been focused on the task of conceptual clustering, i.e., forming clusters with symbolic generalizations in the given representation language. By contrast, for propositional representations, experience has shown that simple algorithms based exclusively on distance measures can often outperform their concept-based counterparts. In this paper, we therefore build on recent advances in the area of rst-order distance metrics and present RDBC, a bottom-up agglomerative clustering algorithm for rst-order representations that relies on distance information only and features a novel parameter-free pruning measure for selecting the nal clustering from the cluster tree. The algorithm can empirically be shown to produce good clusterings (on the mutagenesis domain) that, when used for subsequent prediction tasks, improve on previous clustering results and approach the accuracies of dedicated predictive learners.
منابع مشابه
Robust Extension of FCMdd-based Linear Clustering for Relational Data using Alternative c -Means Criterion
Relational clustering is an extension of clustering for relational data. Fuzzy c-Medoids (FCMdd) based linear fuzzy clustering extracts intrinsic local linear substructures from relational data. However this linear clustering was affected by noise or outliers because of using Euclidean distance. Alternative Fuzzy c-Means (AFCM) is an extension of Fuzzy c-means, in which a modified distance meas...
متن کاملA Hybrid Grey based Two Steps Clustering and Firefly Algorithm for Portfolio Selection
Considering the concept of clustering, the main idea of the present study is based on the fact that all stocks for choosing and ranking will not be necessarily in one cluster. Taking the mentioned point into account, this study aims at offering a new methodology for making decisions concerning the formation of a portfolio of stocks in the stock market. To meet this end, Multiple-Criteria Decisi...
متن کاملCalculating distance measure for clustering in multi-relational settings
The paper deals with a distance based multi-relational clustering application in a real data case study. A novel method for a dissimilarity matrix calculation in multirelational settings has been proposed and implemented in R language. The proposed method has been tested by analyzing publications related to data mining subject and indexed in the medical index database MedLine. Clustering based ...
متن کاملRelational Clustering
We introduce relational variants of neural gas, a very efficient and powerful neural clustering algorithm. It is assumed that a similarity or dissimilarity matrix is given which stems from Euclidean distance or dot product, respectively, however, the underlying embedding of points is unknown. In this case, one can equivalently formulate batch optimization in terms of the given similarities or d...
متن کاملHierarchical Model-Based Clustering for Relational Data
Relational data mining deals with datasets containing multiple types of objects and relationships that are presented in relational formats, e.g. relational databases that have multiple tables. This paper proposes a propositional hierarchical model-based method for clustering relational data. We first define an object-relational star schema to model composite objects, and present a method of fla...
متن کاملA Hybrid Time Series Clustering Method Based on Fuzzy C-Means Algorithm: An Agreement Based Clustering Approach
In recent years, the advancement of information gathering technologies such as GPS and GSM networks have led to huge complex datasets such as time series and trajectories. As a result it is essential to use appropriate methods to analyze the produced large raw datasets. Extracting useful information from large data sets has always been one of the most important challenges in different sciences,...
متن کامل